Sometimes it can be difficult to put all of the pieces together when they are dealt with separately, so here I want to demonstrate how we might approach analyzing a simple study in a Bayesian context. First, let’s explain the system and research question.
The system and question
My former M.S. student, Mitch Le Sage, was interested the role that scavenging invertebrates might play in the transmission and dynamics of Ranavirus epidemics. His general hypothesis was that by removing infectious carcasses, scavengers might reduce transmission rates among amphibian larvae in ponds.
One question that came out of this research was whether invertebrate scavengers, such as dytiscid beetle larvae, were capable of removing many amphibian carcasses. Eating one carcass in a lab setting is one thing, but what if there were a lot carcasses, as in a die-off? Would the scavengers be able to keep up?
Mitch and an undergraduate in the lab set up an experiment where dytiscid beetle larvae were house in small aquaria with 1, 2, 5, 8, or 10 carcasses of Ambystoma macrodactylum larvae to see how much was consumed in a 48 hour period. Since carcasses are scavenged, rather than consumed whole, he measured the starting and ending mass of the carcasses.
This sort of experiment falls reasonably well under the heading of “Functional Responses”, or perhaps you have heard of the “Holling Disc equations.” The history of these is pretty fun1, but for our purposes we can just consider two versions of how the foraging (or scavenging) rate increases with the availability of food (or carcasses).
First, we might simply assume that the foraging rate increases with the density of food. If there are more carcasses in the environment a scavenger will find one to consume more quickly. We could write:
\[
\mu_i = a \times c_i,
\] meaning that the average or expected amount of carcass mass consumed in a particular aquarium (\(\mu_i\)) increases linearly with the amount of carcass mass available (\(c_i\)) at rate \(a\). The parameter \(a\) is often called the “attack rate.” This is called, very creatively, the Type I functional response.
Secondly, we might assume that the rate of carcass removal saturates or levels off simply because a scavenger can only eat carcass so fast. At low carcass densities the scavenging rate is limited by the time it takes to find a carcass, but when carcasses are abundant the rate of removal is instead limited by handling time, \(h\). A classic way to frame this so-called Type II functional response2 is:
\[
\mu_i = \frac{a \times c_i}{1+a\times c_i\times h}
\] We wanted to see which of these models best described carcass removal and then to extrapolate our findings, including our parameter uncertainty, to the pond-level.
Simulating data from the Type I functional response
Let’s start not by looking at their data, but instead by making up some data. We’ll begin with just the linear model, \(\mu_i = ac_i,\) but can then repeat the pattern with with the second model.
First, our “predictor” data: The starting masses of ranged from a low of ~1/5th of a gram to ~10 grams, so we’ll simulate a data set over this range of carcass values. Since we used 18 aquaria in the study, we’ll use that many observations in our simulations, too.
n <-18# number of aquariacarc <-runif(n, min=0.2, max =10)carc
So these are some made up observations of initial carcass masses. There are two things that are quite unrealistic. First, we have a huge level of digits to the right of the decimal place, but that’s probably not something to worry about. Second, since we actually added 1, 2, 5, 8, or 10 carcasses in a tank it’s probably unrealistic to assume that all carcass masses are equally likely, even though the carcasses varied a bit in size. We could just run with this—it may not matter a lot—or we could more realistically simulate our carcass masses.
Indeed, looking back at the paper, we actually used different numbers of replicates for each number of carcasses, so let’s try to make this more realistic.
# actual numbers of carcasses used in each replicaten_carc <-c( rep(1, 5), rep(2, 5), rep(5, 3), rep(8, 3), rep(10, 2) )# now let's draw random carcass sizes and carc <-rnorm(n=18, mean =1*n_carc, sd =0.25) # mean of 1g/carccarc
That actually does a reasonable job of recapitulating the actual carcass masses!
Next, we need to simulate how much of the initial carcass mass is removed. Let’s pick an attack rate. You may not know much about what this should be, but you can at least make some basic assumptions. It cannot be greater than 1, as that would imply that more carcasses are eaten than available. It must be greater than zero. So we can at least say that \(1 \geq a \geq 0\). If we thought some more about the math and the system we could probably do better—for instance, only a fraction of the carcass is scavengable; the cartilage is probably not—but for a start, let’s just say \(a=1/2\).
a <-0.5# attack rate in g/48hmu <-0.5*carc
Let’s plot our new “data”.
plot(mu ~ carc)
And…it’s a line. I guess that shouldn’t be surprising, right? We just specified that it would be! But our data, that is our observations, should not follow exactly along this predicted or expected line, right? I mean, we’d expect a bit of noise both from measuring carcass remains3 and from difference in, among other things, the appetite of a dytiscid beetle larva, how good or bad it is at finding carcasses, and vagaries of where the carcasses end up relative to the beetle larva. Let’s therefore simulate observations from our expectation.
I would think that these slight variations are probably normally distributed. Unless we have a good reason to think otherwise, this is usually a reasonable starting point. We have our mean or expected value, \(\mu_i\), but we need to describe some variation from this expectation. As a start, let’s say \(\sigma = 1\).
obs <-rnorm(n, mean = mu, sd =1)plot(obs ~ carc, col ="red") # observationslines(mu ~ carc) # "expected" relationshipsabline(a=0, b=1, lty=2) # a 1:1 line just for context
That is a fair bit of variation! Moreover, some values are now going negative! That won’t work!
So what are our options? Well, we could turn down the noise (standard deviation or \(\sigma\)) to ensure that no observations are less than zero. We could also just set every observations that is less than zero to zero. However, both of these strategies seem like we are imposing arbitrary limits on the variability in observations. Plus, with ad hoc approaches like the second one, coding things gets hard (especially in the language of our Bayesian models)4 A better option would be to use a distribution of observation that ensured positive values.
One that comes readily to mind is the lognormal. Indeed, it is just a normal distribution that is exponentiated. Let’s see what that looks like:
# Normal distributionhist(rnorm(1e4), breaks =50)
# exponentiated normal distributionhist(exp(rnorm(1e4)), breaks =50, border ="blue", col =NA)# the lognormal distribution is the same as the exponentiated normalhist(rlnorm(1e4), breaks =50, border ="red", col =NA, add = T)
See?
Note, however, that converting between the standard deviation of the normal and lognormal is a bit weird. A standard deviation of 1 on the normal, linear scale, we would imply a standard deviation of \(\log(1) = 0\) on the log scale, but that would mean no deviation at all! Anyway, this is why we simulate, so we can get a sense of what these numbers imply
obs <-rlnorm(n, meanlog =log(mu), sdlog =1/2)plot(obs ~ carc, col ="red") # observationslines(mu ~ carc) # predicted lineabline(a=0, b=1, lty=2) # a 1:1 line just for context
So that’s a bit better, but perhaps too much noise. Let’s tone it down a bit.
obs <-rlnorm(n, meanlog =log(mu), sdlog =1/3)plot(obs ~ carc, col ="red") # observationslines(mu ~ carc) # predicted lineabline(a=0, b=1, lty=2) # a 1:1 line just for contextpoints(obs ~ carc, col ="blue") # new observations
Any points above the 1:1 line might be interpreted as measurement error on the initial or final carcass measurements. It’s probably not worth getting too worked about these issues5; we have data that are fairly realistic and avoid real issues like negative values.
Simulating across possible values of \(a\) (posterior prediction)
So far we have simply picked a single value of the parameters \(a\) and \(\sigma_{\log}\) (my fancy name for the sdlog parameter in the rlnorm() function call, above). But we do not necessarily know what these values are. Instead, we can simulate different data sets from different underlying expected relationships between the initial and consumed carcass mass based on different values of the parameters. Seem like a lot? It’s not too bad.
Let’s start by focusing on simulating a bunch of values of \(a\) and seeing what those imply about the expected relationship between initial and consumed carcass mass. We can keep our guess of 0.5 as maybe our mean, but allow these values to change a bit around this value.
as <-rnorm(n=20, mean=0.5, sd =0.25) # twenty values of a
Let’s see what these look like:
plot(x=carc, y=carc, type ="n") # just to set the scale of the graphabline(a=0, b=1, lty=2) # one-to-one line... slope is parameter bfor(i in1:20){abline(a=0, b=as[i], col ="gray")}
So we have twenty different possible values of the attack rate, \(a\), which is the slope of these lines. They cover a wide range of possibilities, from (in my simulation) almost zero to almost one, but most are in the middle. I guess without knowing any more than this, the results seem pretty OK to me. Now, however, we would like to simulate different data sets given these different possible values of the attack rate. Let’s start with 10 different data sets, though we could of course change this pretty easily.
We want to construct a data frame with ten data sets each with 18 observations that come from a different possible value of \(a\) and \(\sigma_{\log}\). Let’s build this up.
df <-data.frame(# our index of the data set numberd =rep(1:10, each =18),# the initial carcass mass; same as above, only we replicate# the n_carc vector 10 times, one for each data set.carc =rnorm(n=18*10, mean =1*rep(n_carc, times =10), sd =0.25) ) head(df)
hist(df$carc, breaks =30) # can see the peaks in carcass mass around 1-3, 5, 8, and 10 carcasses
This data frame so far just has our predictor or independent variable, initial carcass weights.
Now we need to match our values of \(a\) with our different replicate data sets, one value for each data set, and use that to
df$a <- as[df$d]
See what I did? as is a vector of possible values of the parameter \(a\). I want to use the first value in that vector for the first replicate data set, the second value for the second data set, and so on. Since the d column is the index of the replicate we can just use that to pull out the appropriate entry.
We can then calculate the expected value of the carcasses consumed, mu, for each value of carc using the replicate-specific value of a
df$mu <- df$carc*df$aplot(mu ~ carc, data = df) abline(a=0, b=1, lty=2) # one-to-one line... slope is parameter bfor(i in1:20){abline(a=0, b=as[i], col ="gray")}
So we can see that we now have 10 data sets with different initial carcass densities (\(x\)-values) and different slopes (values of \(a\)). However, we still do not yet have data sets of observations. For that, we need to simulate observations from the expected values given some level of noise. We were using a lognormal distribution, but above had just used a single value of sdlog =1/2 (\(\sigma_{\log}\)). We do not know that this is correct or right. Shouldn’t we explore the effect of different possible values of this parameter, too?
Since \(\sigma_{\log}\) must be greater than zero, but shouldn’t go to crazy high values, let’s assume it is exponentially distributed with a mean of 1/2. The exponential distribution has a single parameter, \(\lambda\), often called the rate parameter. The average of the distribution is simply \(1/\lambda\), so for a mean of 1/2 we should use rate = 2.
# possible values of sdloghist(rexp(n=1e4, rate =2), breaks =20)
We can use the same trick as above to assign a different value of \(\sigma_{\log}\) to each replicate data set.
If we plot these observations (in red) we can see what our parameters (\(a\) and \(\sigma_{\log}\)) say can happen:
plot(mu ~ carc, data = df) # expected values | aabline(a=0, b=1, lty=2) # one-to-one line... slope is parameter bfor(i in1:10){abline(a=0, b=as[i], col ="gray")}points(obs ~ carc, data = df, col ="red") # simulated observations
That is a lot, but it is clear that we sometimes get observations well above the one-to-one (dashed) line.
We can get a bit more a sense of this if we plot each data set separately.
# par(ask=TRUE) # <- asks you to hit return between each new graphfor(i in1:10){plot(obs~carc, data = df, type ="n") # for consistent axis limitspoints(obs ~ carc, data = df[df$d==i, ])abline(a=0, b=1, lty =2, )}
# par(ask=FALSE) # reset
We can see that the problems occur more often when we get larger values of sdlog used in simulating observations. We have a couple of choice. We could use a different distribution to describe the possible values of sdlog, which, again, is just saying how much variation we should see around the expected value, \(mu\). Options include the gamma or beta distributions, which don’t have such long tails. Our second choice would just be to say, you know, we’re making up stuff to see if our code works and to better understand what is reasonable; let’s file this away as sort of problematic, but probably not terribly important. I suggest we take option two for the moment so we don’t get too caught up in things that don’t matter a great deal. Besides, we can always come back to this if we want.
Simulating data from the Type II functional response
Again, our type II function response is of the form: \[
\mu_i = \frac{a \times c_i}{1+a \times c_i \times h},
\]
where \(h\) represents handling time, but everything else is the same. So this should be quick.
We can constrain handling time to be \(\geq 0\) from first principles, but I think we can also say that handling time is also unlikely to be super high, as in it is unlikely that handling time to consume a carcass is on the order of the time of the experiment (48h). We can therefore assume \(1 > h > 0\).
Let’s try simulating some values. I’m going to use the curve function, which is sort of like abline() except that it plots an arbitrary function instead of just a line.
# attack rate is 0.5, # handling time is 0.2 (20% of the duration of the experiment)# x is the initial carcass masscurve(0.5*x/(1+0.5*x*0.2), from =0, to =10)abline(a=0, b=1, lty=2)
It seems that this line curves away from the 1:1 line a bit, but not super fast. Play wit this a bit to get a sense of how handling time affects the shape of the curve. I think you’ll see that as \(h \rightarrow 0\) this line becomes more and more like the Type I functional response.
Now let’s allow for variation in \(h\). I’m going to use a beta distribution because it is bounded between zero and one and allows some control over the shape. I’m just tweaking the shape1 and shape2 parameters (sometimes called \(\alpha\) and \(\beta\) in more theoretical frameworks) until I get something that is skewed towards low values, but does not exclude higher values.
hs <-rbeta(n=10, shape1=1, shape2=3)plot(0:10, 0:10, type ="n") # set axis boundariesabline(a=0, b=1)for(i in1:10){curve(as[i]*x/(1+as[i]*x*hs[i]), add=TRUE, col ="gray")}
If we’re happy with these values, and I guess I am, we can then use them to generate expected values for any given initial carcass mass, like we did before with the Type I functional response. In fact, let’s just recycle what we have and add a column to the data frame for h, which we can then use to generate mu2 that we can use to generate obs2.
Note that we do not have to use the same values of carc, a, and sdlog, but doing so can be useful because we know what, precisely, is changed between mu and mu2, or obs and obs2, which can help us understand the consequences of different parameters or functional response.
As long as we kept track of the variable names, that was actually pretty simple, right?
plot(mu2 ~ carc, data = df) # expected values | aabline(a=0, b=1, lty=2) # one-to-one line... for(i in1:10){curve(as[i]*x/(1+as[i]*x*hs[i]), col ="gray", add =TRUE)}points(obs2 ~ carc, data = df, col ="red") # simulated observations
We still have a bit of trouble with some “observations” ending up with more carcass consumed than they started with—above the 1:1 line—but this is no worse than in our prior simulation of the type I functional response. Overall, it seems like this is working reasonably well!
From simulation to Bayesian model
So, as McElreath claims, and I have found, if you can simulate data you have essentially specified the components of a Bayesian model. Let me demonstrate, first with the Type I functional response model.
While we started with the parameters and then worked our way to observations in the simulation process, we specify the model in the reverse order, beginning with the description of the distribution of data. We simulated observations from a lognormal (using rlnorm()) given an expected value of \(\log(\mu)\) on the logarithmic scale, which I am going to call \(\mu_{\log}\) for consistency and transparency, and a standard deviation we called \(\sigma_{\log}\), so we write:
Now we developed our expectation, \(\mu_i\), from the Type I functional response, which was just the attack rate, \(a\), time the initial carcass mass, \(c_i\). Again, we can add this to our model as:
That is everything that describes our data. We simply need to specify our priors. We used distributions with particular parameters to simulate our data. We can simply use these same distributions and parameters as our priors!
\[
\begin{align}
Obs_i & \sim \text{LogNormal}(\mu_{\log_i}, \sigma_{\log}) \\
\mu_{\log_i} & = \log(\mu_i) \\
\mu_i & = a c_i \\
a & \sim \text{Normal}(0.5, 0.25) \\
\sigma_{\log} & \sim \text{Exponential}(2)
\end{align}
\] That is it. That is our Bayesian model, defined! Not so bad, right?
We can similarly define the second model for the Type II functional response:
Fitting the Bayesian model to the simulated data: Grid approximation
I’m sure you are still expecting this to be hard once we fit the model to our data. I’m here to assure you that it does not need to be very hard. We’ll keep this simple and use grid approximation. Since we have two parameters to worry about, things might look a bit more complicated, but it’s really quite similar. We just need to include the priors of both parameters in the numerator: \[
\Pr(a, \sigma_{\log} | obs) = \frac{\text{Pr}(obs| a, \sigma_{\log})\times \text{Pr}(a)\times\text{Pr}(\sigma_{\log})}{\text{Pr}(obs)}
\]
To do this calculation for realsies the steps are:
establish a (two dimensional) grid of values of our parameters, \(a\) and \(\sigma_{\log}\). This grid is what we are going to be calculating our likelihood and priors over, once for each point on the grid.
calculate the likelihood of observing the data given each combination of parameters (i.e., for each point on the two-dimensional grid)
multiply the likelihood by the prior probability of each combination of parameters to obtain the posterior-ish distribution.
standardize this posterior-ish distribution by dividing each entry by the sum of all of the values in the posterior-ish matrix
There are two complications. First, we’re going to use dnorm() and dexp() to do the calculations of the prior probability of each particular value of \(a\) and \(\sigma_{\log}\), respectively. Remember, these functions simply tell us the probability of any particular value given the values of mean and sd or rate that we supply, and we have already chosen the values of these for our priors! (Look above to refresh your memories if you need.)
Second, multiplying lots of small numbers (probabilities are less than one, after all) leads to all sorts of rounding errors. It turns out that it is much simpler to add the logs of numbers and then exponentiate the result. That is, \(a \times b \times c = \exp[\log(a) + \log(b) + \log(c)]\), and the version on the right-hand side is numerically better for a computer. So we’re going to do that.
OK, onward! But first, let’s specify a data set! We’ll just use the first of the 10 we generated.
obs <- df$obs[df$d ==1]carc <- df$carc[df$d ==1]
Now, step 1: establish a grid of possible values of our parameters using the expand.grid() function that constructs a matrix of every combination of the different vectors you give it.
pars <-expand.grid(a =seq(from=0, to=1, length.out=100), sdlog =seq(from=0, to=2, length.out=100))
Step 2: calculate the likelihood of observing the data given each combination of parameters. Actually, we’re going to calculate the log-likelihood for each observation and then add up the these log-likelihoods at each combination of parameters. Also, we’re going to use sapply() to do this for each row of our pars list.
LL <-sapply(X =1:nrow(pars), FUN =function(i) {sum(dlnorm(x=obs, meanlog =log(pars$a[i]*carc), #mu_logsdlog = pars$sdlog[i],log=TRUE# to calculate the log-likelihood ) ) } )
You might look at the head() and tail() of this LL vector we created. Notice that some values are -Inf. See what exponentiation these values produces. Does that make sense?
Step 3: multiply the likelihood by the prior probability of each combination of parameters to obtain the posterior-ish distribution. But remember, we’re working on the log-scale, so we want to calculate the log-probability of each value of each parameter across the grid and then add them to the log-likelihoods we just calculated.
We need to now exponentiate these log-probabilities, but we need to worry about very small values being rounded to zero, so McElreath’s solution is to first scale all of the entries by scaling all of the log-products by the maximum log-products. This means that what we get out is not exactly a probability distribution, but they are relative (or proportional to) posterior probabilities. I’m just going to trust him on this.
post <-exp(post_ish -max(post_ish))
Working with the posterior
Right…so what do we do with this? Well, we can inspect the parameter distributions. Indeed, we probably want to see if we recaptured the True values of the parameters, the ones we simulated from. This is a good internal check to make sure everything is work.
Interal consistency checks
The simplest way to do this is simply to draw out samples of the parameters relative to their (posterior) probability. With other engines for Bayesian models we’ll have our samples from the posterior as output, directly, but with a grid approximation we need to produce them. Let’s first draw rows (or index numbers) of the posterior distribution, post, in proportion their probabilities
We can then use these indexes to find the parameter values that are associated with these most probable parts of the distribution. This is like describing the shape of a mountain by taking samples of positions in space based on the height of those positions and then subsequently finding the coordinates of those points. Or short version, we now know the positions of the parameter data frame that has the most probable parameter values.
sample.a <- pars$a[samples] # see? indexing draws out the values of asample.sdlog <- pars$sdlog[samples]
Now we can plot histograms or density plots of these parameter samples. I’m going to ensure this covers the full range of values we considered in our grid approximation:
hist(sample.a, freq =FALSE, breaks =0:100/100)
Depending on the data set and underlying parameters, this posterior distribution may look truncated on its edge. That would be because we only considered possible values from zero to one, so the posterior might be truncated by the way we ran the model, not because of any weird parameter magic.
It can be helpful to see how the posterior differs from our prior expectation of the attack rate parameter, \(a\).
Notice how the data pulled our rather vague expectations about what the attack rate might be into a much more narrow, tall distribution about the values of \(a\) that are most consistent with the prior and data. Think of it as our golem learning from the data. It seems to have learned a lot!
But did it learn the right things? Recall that we actually know the True value of a used to simulate the data. Does our posterior capture this value? Let’s add in a vertical red line to show that.
Again, depending on the random values of a, the red line might not fall in the exact center of the posterior—because of the prior we assumed our golem was skeptical of values near zero or one (and beyond), even if those values were True—it should be included somewhere in there.
We can do the same for the other parameter, \(\sigma_{\log}\).
hist(sample.sdlog, freq =FALSE, breaks =0:200/100)curve(dexp(x, rate=TRUE), add =TRUE)abline(v=sdlogs[1], col ="red")
Again, we can see how our golem learned a lot about the variation in the data and the range of values the golem thinks are consistent with the priors and data (i.e., the posterior distribution) should include the True value, even if it isn’t the most probable value.
The influences of parameters are often not independent of one another, and it can be useful to see the probability of combination of parameters; although not so much in this case. Or, in other words, it can be useful to plot the priors and posteriors together so we can see how they co-vary.
Working in base R graphics is a pain, but McElreath has some handy functions that help, for instance, this contour plot.
Note that since we didn’t have the prior probabilities of any combination of parameters in any particular place, we had to calculate them. We can also see how much our posterior changed from the prior.
We can also calculate means and modes and intervals, just like you often get for parameter estimates.
We can also plot the relationships between initial and consumed carcass densities that are most consistent with the priors and data (i.e., most likely in the posterior). In this case, it’s pretty simple because our expected value, \(mu_i\) is just a simple straight line, so I’ll use the short-cut of the abline() function.
plot(x=0:10, y=0:10, type ="l", lty=2)# some draws from the prior of afor(i in1:20){abline(a=0, b= as[i], col =rgb(0,0,1, alpha =0.2))}# draws from the posterior of afor(i in1:100){abline(a=0, b= sample.a[i], col =rgb(0,0,0, alpha =0.2))}
We can see that the relationship is much narrower than it might have been.
Posterior predictive checks
It can be useful to plot the range of likely values of the observations. After all, we might get the gist of the relationship right, but really miss the mark in the data-generating process. For instance, perhaps our actual (simulated) observations often lie outside of the range of what our golem expects. These more extreme observations would tend to pull our golem’s understanding to these extremes. More on this in the future. But for now, let’s see how we could produce and plot possible observations.
# Generate a range of values along the x-axisnew_carcs <-seq(0.1, 10, length.out=50)# for each value along the x-axis, generate 10,000 possible observations# each based on a draw of a and sdlogpossible_obs <-sapply(X = new_carcs, FUN =function(x) {rlnorm(n=1e4, meanlog =log(x*sample.a[1:1e4]), sdlog = sample.sdlog[1:1e4]) })
If you inspect this (e.g., head(possible_obs)) you will see that there are 50 columns representing the 50 positions along the x-axis, each with 1000 possible observations given 10,000 draws from the \(a\) and the \(\sigma_{\log}\) posteriors. That might be a bit too much to plot, but we can use a quantile function to find the 5th and 95th percentiles, or similar, among the observations at each point along the x-axis.
If you inspect this you see the range of the internal 90% of possible observations. However, it is wide rather than long. Thankfully, R makes it easy to transpose a matrix:
and the predicted line based on the mean value of \(a\) from the posterior
pred_obs$pred <-mean(sample.a)*pred_obs$carc
We can plot this posterior predictive interval and then superimpose our actual (simulated) observations.
# rename 5% and 95% since R will choke on thosenames(pred_obs)[1:2] <-c("low", "hi")plot(0:10, 0:10, type ="l", lty=2) # 1:1 linelines(pred ~ carc, data = pred_obs, col ="red") # mean predictionlines(low ~ carc, data = pred_obs, col ="red", lty=3) #lowerlines(hi ~ carc, data = pred_obs, col ="red", lty=3) # upper# plot actual (simulated) observationspoints(x=carc, y=obs)
We should see that most, but not necessarily all of the observations fall within our prediction interval. We should also not see systematic deviations from what the model says is likely, which would indicate some sort of bias or model misspecification.
Hint
We can follow the same methods to simulate possible observations from the prior distribution, as well. It can be one thing to get a strong sense of the average or expected values based on prior values, but sometimes we completely flub the part of the model that specifies the probability of observing particular observations, making the noise much too small or much too large.
So at this point it seems that the model can reasonably recover the True values of the parameters (at least in my simulation) and seems to produce data that look like our (simulated) data. Does this work repeatedly for different data sets? It would be easy to check. Simply change the data set (e.g., df[df$d == 2, ]) and re-run the code. Ideally our model will work over a variety of possible data sets. If there are edge cases where things break, it can be useful to sort out why. Was it because of our priors? Were there limitations or implications of the way we produced expected values (i.e., \(\mu_i\)) that are coming to bite us? In essence, we want to have some confidence that our model is working how we think it should work, even in the edge cases.
Perspective
Is this a lot of work? Perhaps it is, but by this point we have a lot of confidence that we understand what we can expect and that our model is working as designed. It may feel like overkill in a simple case like this, but it is essential for more complex models where our intuition often fails us and simple inspection of parameter values yield minimal insights. So let’s practice in the simple case, getting a sense of the mechanics, before we move into less-charted waters. And anyway, the tools we (will) build to simplify things here should work in these more complex cases. If nothing else, I think working through these steps helps keep us in the driver’s seat of our analyses.
Repeating the process with the Type II functional response
We’ll repeat the process of producing, fitting, and checking our model with the Type II functional response. It will be quicker, but hopefully this repetition makes things feel more familiar. Again, the model is: \[
\begin{align}
Obs_i & \sim \text{LogNormal}(\mu_{\log_i}, \sigma_{\log}) \\
\mu_{\log_i} & = \log(\mu_i) \\
\mu_i & = \frac{a c_i}{1+a h c_i } \\
a & \sim \text{Normal}(0.5, 0.25) \\
h & \sim \text{Beta}(1,3) \\
\sigma_{\log} & \sim \text{Exponential}(2)
\end{align}
\]
We’ll use the observations from the first simulated data set as our data.
obs2 <- df$obs2[df$d ==1]
Step 1: establish a grid of possible values of our parameters using the expand.grid(). We now have three parameters to consider, \(a\), \(h\), and \(\sigma_{\log}\). Since we using 100 points along each axis we have \(100^3 = 1,000,000\) values to consider6.
Note that I am labeling everything with a 2 after it. This is just so we do not overwrite our prior model and related things, which allows us to work with both models. Just be careful copy-pasting!
Step 2: calculate the log-likelihood of observing the data given each combination of parameters.
LL2 <-sapply(X =1:nrow(pars2), FUN =function(i) {sum(dlnorm(x=obs2, meanlog =log(pars2$a[i]*carc/(1+pars2$a[i]*carc*pars2$h[i])), #mu_logsdlog = pars2$sdlog[i],log=TRUE# to calculate the log-likelihood ) ) } )
All we did was update our expected value based on the new functional form. Everything else is the same. Still, since we have 100x as many parameter combinations to consider I’m guessing that this operation took a few beats to calculate!
Step 3: multiply the likelihood by the prior probability of each combination of parameters to obtain the posterior-ish distribution.
post_ish2 <- LL2 +dnorm(pars2$a, mean=0.5, sd=0.25, log=TRUE) +dbeta(pars2$h, shape1=1, shape2=3) +# this is the only partdexp(pars2$sdlog, rate=2, log=TRUE)
Now we exponentiate these log-probabilities, first scaling all of the entries by by the maximum log-products.
post2 <-exp(post_ish2 -max(post_ish2))
Now it is time to find our samples from the posterior. (Be careful about the relabeling.)
We can also see the relationship between \(a\) and \(h\) in the posterior7.
# posteriorplot(sample2.a, sample2.h, pch=16, col =rgb(0,0,1, 0.05))
Notice that there is a positive relationship such that larger estimates of attack rates are associated with larger estimates of handling time. Can you think of why?
Let’s now plot our understanding of the relationships between initial and consumed carcasses most consistent with the prior and data.
plot(x=0:10, y=0:10, type ="l", lty=2)# some draws from the prior of afor(i in1:20){curve(as[i]*x/(1+as[i]*x*hs[i]), col =rgb(0,0,1, alpha =0.2), add = T)}# draws from the posterior for(i in1:1000){curve(sample2.a[i]*x/(1+sample2.a[i]*x*sample2.h[i]), col =rgb(0,0,0, alpha =0.05), add = T)}
And finally, we can calculate and plot a posterior predictive interval describing the range of observations most consistent with our prior and data.
# Generate a range of values along the x-axisnew_carcs <-seq(0.1, 10, length.out=50)# for each value along the x-axis, generate 10,000 possible observations# each based on a draw of a and sdlogpossible_obs2 <-sapply(X = new_carcs, FUN =function(x) {rlnorm(n=1e4, meanlog =log(x*sample2.a[1:1e4]/(1+x*sample2.a[1:1e4]*sample2.h[1:1e4])), sdlog = sample2.sdlog[1:1e4]) })# at each position then find the upper and lower 90% intervalinterval2 <-apply(X=possible_obs2, MARGIN =2,FUN = rethinking::PI, prob =0.9)# transpose and convert to a data framepred_obs2 <-as.data.frame(t(interval2))#add in the x-values,pred_obs2$carc <- new_carcs# and the predicted line based on the mean value of a & h from the posteriorpred_obs2$pred <-mean(sample2.a)*pred_obs2$carc/(1+mean(sample2.a)*pred_obs2$carc*mean(sample2.h))# plot posterior predictive interval# rename 5% and 95% since R will choke on thosenames(pred_obs2)[1:2] <-c("low", "hi")plot(0:10, 0:10, type ="l", lty=2) # 1:1 linelines(pred ~ carc, data = pred_obs2, col ="red") # mean predictionlines(low ~ carc, data = pred_obs2, col ="red", lty=3) #lowerlines(hi ~ carc, data = pred_obs2, col ="red", lty=3) # upper# plot actual (simulated) observationspoints(x=carc, y=obs2)
The vast majority of that code is the same. So, again, as long as you keep track of what’s what, you don’t need to reinvent the wheel each time you construct or run a model.
Fitting each model to the other’s data
So far we have fit each model to the data it produced, as a check on how well it works. But in the end, we are hoping to fit both models to the same real data and see which model is most consistent with those real observations. So it seems useful to fit each model to observations produced by the other model. In particular, let’s try fitting the Type I model to the data produced by the Type II model to see if we would even notice any systematic issues. (We’ll come back to metrics of model fit, later.)
Let’s simply re-fit the Type I model to the Type II data.
MORE HERE
Real data!
OK, all of that was warm-up, model and tool development and testing. It was all meant so that we could be confident we knew how things worked. Now we are ready to fit the model to our actual data!
MORE HERE: fit both models, examine pars, plot posterior predictive plot for both models on same axes with data. Interpret result
Footnotes
To better understand how animals might forage, what strategies they would take, etc., he had undergraduate volunteers pick up discs of sand paper (I think… my recollection is imperfect) with tacks or their fingers while blind folded, etc., counting how many they could get in a certain amount of time. He noticed the empirical patterns, which others than built some equations to describe and even explain; I don’t think he was that mathematically inclined. Anyway, they’ve been hugely influential and useful, including here.↩︎
There are actually a lot of equations that produce more or less similar curves, starting with different assumptions. The Michaelis-Menton equation is one common approach. It just goes to show you that a hypothesis can be represented by multiple models and a model can be consistent with multiple equations.↩︎
We can do these things, but they’re beyond our current scope.↩︎
If we wanted to enforce our consumed carcass values to be \(\leq\) the initial carcass values we could model the proportion of the carcass consumed and use a beta distribution to describe the variation.↩︎
NB: using the same plotting functions as before caused…issues, so we’ll just do this hacky version for now. Since we’re not going to keep using grid-approximation for too much longer it’s not worth spending too much time how to circumvent these issues.↩︎